Virus Evolution
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
Accurate identification of unknown pathogens is critical for medicine and public health, yet current metagenomic workflows remain heavily dependent on specialized bioinformatics expertise and manual interpretation, creating substantial bottlenecks in time-sensitive diagnostic settings1. The key challenges lie in achieving precise species identification amidst high background noise and translating complex microbial data into clinically actionable insights2,3. Here we present the Global Pathogen A...
Show abstract
Herpes simplex virus (HSV) is an endemic pathogen, infecting most adults world-wide. HSV infection can cause a wide spectrum of disease outcomes, ranging from asymptomatic infection or mild lesions to rare cases of infectious keratitis, encephalitis, and death. HSV genome sequences have been shown to differ between individual patients, as well as within individuals. To date, the vast majority of publicly available HSV genomic data has come from Europe and North America. Our current understanding...
Show abstract
Influenza A(H3N2) subclade K virus was detected in Canada early in the 2025/26 influenza season, bearing an antigenic transition in the hemagglutinin (HA) glycoprotein. Analysis of 396 HA sequences from Canada showed antigenic divergence from 2025/26 influenza vaccine strains, consistent with partial mismatch. Phylodynamic analysis revealed sustained pre-vaccine transmission without clear post-vaccine expansion. Phylogenetic and phylogeographic analyses indicated interprovincial mixing within a ...
Show abstract
Strengthening in-country sequencing capacity generated 28 Lassa virus genomes from human clinical cases, expanding our knowledge of Lassa fever in Guinea. Phylogeographic analysis revealed cross-border exchange between Liberia and the NZerekore region, and a Sierra Leone introduction into the Gueckedou area. Enhanced genomic surveillance is crucial to guide future public health actions.
Show abstract
Bacterial infections are a major cause of morbidity and mortality among children under five in low- and middle-income countries (LMICs). Children in LMICs are exposed to and colonized by a range of pathogenic bacteria, yet patterns of bacterial exchange between humans are not well known, in part because culturing and sequencing single bacterial isolates is labor-intensive. Here, we apply a machine learning strain tracking approach to metagenomic data from 511 stool samples from children and moth...
Show abstract
The emergence of vaccine covered serotypes causing invasive pneumococcal disease (IPD) is a serious concern worldwide. We investigated the unexpected rise of serotype 4 causing IPD primarily in non-vaccinated young adults after the COVID-19 pandemic that further spread to adults [≥] 65 years in recent years. For this purpose, we conducted a retrospective study of serotype 4 IPD cases (n=827) reported in Spain between 2009 and 2024. Whole-genome sequencing was performed to assess clonal lineag...
Show abstract
Biological fitness quantifies the efficiency and selective advantage of pathogens and hosts in their bilateral interaction. Key questions--such as how much more infectious an emerging variant is compared with its predecessor, or how much protection vaccination offers relative to no vaccination--require fitness to be measured systematically, in real time, and ideally beyond controlled laboratory settings. We propose an approach that infers biological fitness from mostly non-biological data on inf...
Show abstract
BackgroundThe hyperpolymorphic nature and structural complexity of the human leukocyte antigen (HLA) genomic region present challenges for accurate and scalable typing across diverse sample types. While wholegenome sequencing (WGS) offers the opportunity to infer HLA genotypes without targeted enrichment, systematic benchmarks across sequencing platforms, biospecimens and coverage levels remain limited. ResultsWe assembled a multi-platform resource of WGS datasets derived from short-read (Illum...
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWInfluenza A subclade K viruses caused high infection rates in the 2025/2026 Northern Hemisphere season, raising concerns about antigenic drift and reduced vaccine effectiveness. We measured antibody responses in matched human pre- and post-vaccination sera against a vaccine-like as well as subclade K isolates. Pre-existing immunity to subclade K variants was noted with seasonal influenza vaccination boosting titers two-fold against subclade K and three-fold against the va...
Show abstract
Pathogenic organisms are typically thought to be constrained by a tradeoff between the rate and duration of transmission, an assumption that underpins a considerable body of evolutionary theory. Here we test for a transmission-duration tradeoff using detailed historical malaria infection data from an era prior to widespread use of antibiotics when humans were deliberately infected with malaria parasites as treatment for neurosyphilis (malariatherapy). These time series follow individual human in...
Show abstract
An H3N2 variant, named subclade K, continues to circulate widely during the 2025-2026 influenza season. This virus possesses a hemagglutinin (HA) protein that has eleven substitutions relative to the HA of the Northern Hemisphere 2025-2026 H3N2 vaccine strain. Many of these substitutions are in epitopes in well-characterized HA antigenic sites. Despite this, interim vaccine effectiveness studies indicate that the 2025-2026 influenza vaccine provides moderate protection against H3N2 subclade K in...
Show abstract
BackgroundElimination of Plasmodium vivax is challenging due to its dormant liver stages (hypnozoites), which can reactivate weeks or months after the primary infection, causing relapses and ongoing transmission of the parasite. Despite these challenges, P. vivax clinical case numbers have declined over the past decade in Cambodia. We used parasite genotyping to assess whether the decline in case numbers was reflected in parasite diversity and relatedness as a proxy to transmission. MethodsGeno...
Show abstract
Genomic surveillance of influenza viruses informs vaccine strain selection and evolutionary forecasting. Sequencing efforts vary widely across U.S. states, which raises concerns about spatial sampling bias. We evaluated how well 10,958 influenza virus genomes sampled by our group in Michigan captured the genetic diversity in 34,743 genomes circulating nationally from the 2021/22 through 2024/25 seasons. We defined seasonal hemagglutinin haplotypes and tracked their detection across states. A sma...
Show abstract
Linezolid is a critical last-resort antimicrobial for multidrug-resistant Enterococcus faecium, particularly against vancomycin-resistant lineages where therapeutic options are severely limited. While resistance has historically arisen through de novo chromosomal mutations, the global emergence of transferable resistance mechanisms threatens to render more infections untreatable. Here, we characterise a recent (2023-2024) hospital-associated outbreak of linezolid-resistant E. faecium in Queensla...
Show abstract
Clade 2.3.4.4b highly pathogenic avian influenza A(H5N1) viruses continue to expand geographically and across mammalian hosts, raising concern about pandemic potential. The degree and specificity of pre-existing immunity in humans are key determinants of this risk. We analyzed hemagglutinin (HA)-and neuraminidase (NA)-specific antibody responses in 300 sera collected from adults in New York City. While HA directed binding antibodies to clade 2.3.4.4b H5 were low and hemagglutination-inhibiting a...
Show abstract
In the three years since Omicron emergence, SARS-CoV-2 dynamics have exhibited persistent twice-yearly waves in the United States, peaking in late summer and winter, with heterogeneity in timing and intensity across states. This semiannual pattern sharply contrasts with typical annual respiratory pathogen dynamics in the US, yet their underlying mechanisms and whether this pattern will persist remain poorly understood. Here, we tested several hypothesized mechanisms and found that a combination ...
Show abstract
Antimicrobial resistance (AMR) is a growing problem, with annual deaths set to pass 10 million by 2050 if current trends continue. Wastewater surveillance has been proposed as a strategy to understand population-level resistance, and water reclamation facilities (WRFs) have been identified as a control point for environmental dissemination of resistant bacteria. Understanding dynamics of AMR across WRFs requires advanced molecular tools that elucidate host bacteria, especially for mobile resista...
Show abstract
A concern in infectious disease modelling is how accurately population mixing is incorporated, as it shapes the type and frequency of contacts through which infection spreads, and consequently, estimated intervention effectiveness. Although synthesizing mixing patterns from diary-based surveys is an established framework, geographical information is poorly or sparsely captured. Here we propose a generalizable workflow to quantify geographical connectivity from job registry data covering over 8 m...
Show abstract
BackgroundVariation in the HLA loci, located on human chromosome 6p, has been associated with hundreds of diseases and conditions. However, high levels of polymorphism that characterize the HLA system, coupled with generally modest effect sizes for most phenotypes, necessitate relatively large sample sizes to power association studies; meanwhile, high resolution HLA genotyping remains relatively resource intensive. These constraints limit identification of novel associations. While phenome-wide ...
Show abstract
The gut microbiome has been linked to breast cancer, largely through microbial functions involved in estrogen metabolism (the "estrobolome"); however, specific microbial targets remain poorly defined in human studies. Here, we profiled the gut microbiome using whole-metagenome shotgun sequencing, and plasma and stool metabolites were quantified using targeted metabolomics, in a study of 70 postmenopausal female cases with treatment-naive ER-positive breast cancer and 70 controls. Reduced species...